Connectome is an R package to automate ligand-receptor mapping and visualization in single-cell data. It is built to work with Seurat from Satija Lab.
require(Seurat)
require(SeuratData)
require(connectome)
require(ggplot2)
require(cowplot)
CreateConnectome uses the active identity slot in a Seurat 3.0 object to define “nodes” for network creation and network analysis.
Node parcellation should be done carefully as it will have a non-trivial effect on downstream connectomics.
Here, we run Connectome on the cross-platform Pancreas data distributed by SeuratData:
InstallData('panc8')
data('panc8')
table(Idents(panc8))
#>
#> celseq celseq2 fluidigmc1 indrop smartseq2
#> 1004 2285 638 8569 2394
Idents(panc8) <- panc8[['celltype']]
table(Idents(panc8))
#>
#> gamma acinar alpha
#> 625 1864 4615
#> delta beta ductal
#> 1013 3679 1954
#> endothelial activated_stellate schwann
#> 296 474 25
#> mast macrophage epsilon
#> 56 79 30
#> quiescent_stellate
#> 180
Here, we normalize the data, identify the ligand and receptor genes that have mapped in the object, scale those genes, and generate the connectome. To accelerate computation time, we omit calculation of Wilcoxon rank p-values and gene-wsie diagnostic odds ratios. We also limit our connectomic analysis to only those clusters with > 75 cells captured.
panc8 <- NormalizeData(panc8)
connectome.genes <- union(connectome::ncomms8866_human$Ligand.ApprovedSymbol,connectome::ncomms8866_human$Receptor.ApprovedSymbol)
genes <- connectome.genes[connectome.genes %in% rownames(panc8)]
panc8 <- ScaleData(panc8,features = genes)
panc8.con <- CreateConnectome(panc8,species = 'human',min.cells.per.ident = 75,p.values = F,calculate.DOR = F)
#>
|
| | 0%
|
|====== | 10%
|
|============= | 20%
|
|==================== | 30%
|
|========================== | 40%
|
|================================ | 50%
|
|======================================= | 60%
|
|============================================== | 70%
|
|==================================================== | 80%
|
|========================================================== | 90%
|
|=================================================================| 100%
This is a critical step which is both dataset- and scientific question-dependent. First, let’s look at the distribution of ligand and receptor z-scores:
# Change density plot line colors by groups
p1 <- ggplot(panc8.con, aes(x=ligand.scale)) + geom_density() + ggtitle('Ligand.scale')
p2 <- ggplot(panc8.con, aes(x=recept.scale)) + geom_density() + ggtitle('Recept.scale')
p3 <- ggplot(panc8.con, aes(x=percent.target)) + geom_density() + ggtitle('Percent.target')
p4 <- ggplot(panc8.con, aes(x=percent.source)) + geom_density() + ggtitle('Percent.source')
plot_grid(p1,p2,p3,p4)
Given these outputs, let’s filter the data to only include edges with a z-score above 0.25, and with both the ligand and receptor expressed in at least 10% of the cells in their respective cluster:
panc8.con2 <- FilterConnectome(panc8.con,min.pct = 0.1,min.z = 0.25,remove.na = T)
We are now ready to begin making plots of this single-tissue system.
This is the simplest form of network visualization, but it can be difficult to read. This function will take as input any parameters which can be passed to FilterConnectome, thereby allowing in-line filtration. Here, we exclusievely look at cell-cell edges involving VEGFA. We also show the effect of filtration based on percent expression:
NetworkPlot(panc8.con2,features = 'VEGFA',min.pct = 0.1,weight.attribute = 'weight_sc')
NetworkPlot(panc8.con2,features = 'VEGFA',min.pct = 0.75,weight.attribute = 'weight_sc')
This function performs a graph-theory analysis on an entire network. It ranks producers and receivers within each signaling family (‘mode’) based on cumulative incomine/outgoing edgeweight and Kleinberg centrality. This allows quick identification of top ligand producers and correlating receptor receivers, in each signaling family.
ModalDotPlot(panc8.con2,modes.include = NULL,min.z = NULL,weight.attribute = 'weight_sc')
ModalDotPlot can also be used to exclusively look at signlaing families of particular interest:
ModalDotPlot(panc8.con2,modes.include = c('VEGF','WNT','Semaphorins','NOTCH','FGF'),weight.attribute = 'weight_sc',min.z = 0)
If a particular cell-to-cell interaction is of particular interest, CellCellScatter can be useful to immediately identify top signaling modes between those two cells:
p1 <- CellCellScatter(panc8.con2,sources.include = 'endothelial',targets.include = 'activated_stellate',
label.threshold = 3,
weight.attribute = 'weight_sc',min.pct = 0.25,min.z = 2)
p1 <- p1 + xlim(0,NA) + ylim(0,NA)
p1
From the above plot, we can see that endothelial cells are prominently expressing DLL4, which can be sensed by NOTCH3 receptor on ativated stellate cells.
It might then be important to ask if the DLL4-NOTCH3 interaction from the above example is exclusive to the endothelial-activated stellate cell vector. It is possible that although it is a top mechanism between these two cells, there may be another cell-cell vector which more strongly utilizes this pathway. To test this hypothesis, we can use ‘SignalScatter’ to investigate the entire DLL4-NOTCH3 network across all cell types. We also choose to include additional ligands which can be sensed by NOTCH3:
p2 <- SignalScatter(panc8.con2, features = c('JAG1','JAG2','DLL4'),label.threshold = 1,weight.attribute = 'weight_sc',min.z = 2)
p2 <- p2 + xlim(2,NA) + ylim(2,NA)
p2
From this plot, we see that DLL4-NOTCH3 is most likely mediating communication between endothelial cells and both activated and quiescent stellate cells. Further, we can see that NOTCH4 is likley mediating endothelial autocrine communication, both via DLL4 and other ligands (JAG1 and JAG2).
The clearest way we have found to visualize cell-cell connectivity in scRNAseq data is through circos plots made with the R package circlize. We have therefore built a versatile CircosPlot function into the Connectome package.
First, let’s select the top 5 signaling vectors for each cell-cell vector:
test <- panc8.con2
test <- data.frame(test %>% group_by(vector) %>% top_n(5,weight_sc))
Then, let’s focus in on specific cell types of interest:
cells.of.interest <- c('endothelial','activated_stellate','quiescent_stellate','macrophage')
CircosPlot can make 4 distinct plot types, with subtle quantitative differences:
# Using edgeweights from normalized slot:
CircosPlot(test,weight.attribute = 'weight_norm',sources.include = cells.of.interest,targets.include = cells.of.interest,lab.cex = 0.6)
# Using edgeweights from scaled slot:
CircosPlot(test,weight.attribute = 'weight_sc',sources.include = cells.of.interest,targets.include = cells.of.interest,lab.cex = 0.6)
# Using separate ligand and receptor expression (from normalized slot)
CircosPlot(test,weight.attribute = 'weight_norm',sources.include = cells.of.interest,targets.include = cells.of.interest,balanced.edges = F,lab.cex = 0.6)
# Using separate ligand and receptor expression (from scaled slot)
CircosPlot(test,weight.attribute = 'weight_sc',sources.include = cells.of.interest,targets.include = cells.of.interest,balanced.edges = F,lab.cex = 0.6)
Although the 4 plots look similar, each is leveraging slightly different quantitative information. Appropriate use should be determined by the scientific question of interest.
Connectome can be used to visualize the cell-type specific “niche” network for a given cell type, showing all possible communication pathways converging on a given cell. This can be done leveraging the targets.include argument built into many of the above functions:
CircosPlot(test,targets.include = 'endothelial',lab.cex = 0.6)
Similarly, the argument sources.include can be used to visualize all edges coming from a given cell type:
CircosPlot(test,sources.include = 'endothelial',lab.cex = 0.6)
Although it can be difficult to intepret, Connectome can also be used to provide large-scale overviews of intercellular signaling networks in scRNAseq data:
CircosPlot(panc8.con2,min.z=1.5)